Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition
نویسندگان
چکیده
Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo—a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the term-document matrix and frequent phrase extraction using suffix arrays. Finally, we discuss results acquired from an empirical evaluation of the algorithm. Knowledge is of two kinds: we know a subject ourselves, or we know where we can find information about it. — Samuel Johnson, 1775
منابع مشابه
Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members
Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...
متن کاملA Dimensionless Parameter Approach based on Singular Value Decomposition and Evolutionary Algorithm for Prediction of Carbamazepine Particles Size
The particle size control of drug is one of the most important factors affecting the efficiency of the nano-drug production in confined liquid impinging jets. In the present research, for this investigation the confined liquid impinging jet was used to produce nanoparticles of Carbamazepine. The effects of several parameters such as concentration, solution and anti-solvent flow rate and solvent...
متن کاملTest of BibTEX references
[1] J. Stefanowski and D. Weiss, “Carrot2 and language properties in web search results clustering,” in Proceedings of AWIC-2003, First International Atlantic Web Intelligence Conference, ser. Lecture Notes in Computer Science, E. M. Ruiz, J. Segovia, and P. S. Szczepaniak, Eds., vol. 2663. Madrid, Spain: Springer, 2003, pp. 240–249. [Online]. Available: http://www.cs.put.poznan.pl/dweiss/xml/ ...
متن کاملNoise Effects on Modal Parameters Extraction of Horizontal Tailplane by Singular Value Decomposition Method Based on Output Only Modal Analysis
According to the great importance of safety in aerospace industries, identification of dynamic parameters of related equipment by experimental tests in operating conditions has been in focus. Due to the existence of noise sources in these conditions the probability of fault occurrence may increases. This study investigates the effects of noise in the process of modal parameters identification b...
متن کاملSemantic, Hierarchical, Online Clustering of Web Search Results
Today, search engine is the most commonly used tool for Web information retrieval, however, its current status is still far from satisfaction. This paper focuses on clustering Web search results in order to help users find relevant Web information more easily and quickly. The main contributions of this paper include the following. (1) The benefits of using key phrases as natural language inform...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004